Analyzing Approximate Value Iteration Algorithms

نویسندگان

چکیده

In this paper, we consider the stochastic iterative counterpart of value iteration scheme wherein only noisy and possibly biased approximations Bellman operator are available. We call approximate (AVI) scheme. Neural networks often used as function approximators, in order to counter Bellman’s curse dimensionality. they operator. Because neural typically trained using sample data, errors biases may be introduced. The design AVI accounts for implementations with sampling errors. present verifiable sufficient conditions under which is stable (almost surely bounded) converges a fixed point To ensure stability AVI, three different yet related sets that based on existence an appropriate Lyapunov function. These function–based easily new literature. verifiability enhanced by fact recipe construction necessary also provided. show analysis can readily extended general case set-valued approximations. Finally, more circumstances, is, finding points contractive maps.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topological Value Iteration Algorithms

Value iteration is a powerful yet inefficient algorithm for Markov decision processes (MDPs) because it puts the majority of its effort into backing up the entire state space, which turns out to be unnecessary in many cases. In order to overcome this problem, many approaches have been proposed. Among them, ILAO* and variants of RTDP are state-of-the-art ones. These methods use reachability anal...

متن کامل

Approximate Value Iteration with Temporally Extended Actions

Temporally extended actions have proven useful for reinforcement learning, but their duration also makes them valuable for efficient planning. The options framework provides a concrete way to implement and reason about temporally extended actions. Existing literature has demonstrated the value of planning with options empirically, but there is a lack of theoretical analysis formalizing when pla...

متن کامل

Feature-Discovering Approximate Value Iteration Methods

Sets of features in Markov decision processes can play a critical role in approximately representing value and in abstracting the state space. Selection of features is crucial to the success of a system and is most often conducted by a human. We study the problem of automatically selecting problem features, and propose and evaluate a simple approach reducing the problem of selecting a new featu...

متن کامل

Error Bounds for Approximate Value Iteration

Approximate Value Iteration (AVI) is an method for solving a Markov Decision Problem by making successive calls to a supervised learning (SL) algorithm. Sequence of value representations Vn are processed iteratively by Vn+1 = AT Vn where T is the Bellman operator and A an approximation operator. Bounds on the error between the performance of the policies induced by the algorithm and the optimal...

متن کامل

Restricted Value Iteration: Theory and Algorithms

Value iteration is a popular algorithm for finding near optimal policies for POMDPs. It is inefficient due to the need to account for the entire belief space, which necessitates the solution of large numbers of linear programs. In this paper, we study value iteration restricted to belief subsets. We show that, together with properly chosen belief subsets, restricted value iteration yields near-...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Mathematics of Operations Research

سال: 2022

ISSN: ['0364-765X', '1526-5471']

DOI: https://doi.org/10.1287/moor.2021.1202